Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenCL] Change of OpenCL profiling logic #11180

Merged
merged 9 commits into from
May 10, 2022
Merged

Conversation

argrento
Copy link
Contributor

@argrento argrento commented Apr 29, 2022

Profiling in TVM is enabled or disable in compile time by the USE_PROFILER switch. It means that if we enable profile in config.cmake, but do not use any profiling features in the app, OpenCL is forced to collect cl_events objects.

Build TVM with set(USE_PROFILER ON).
Consider simple app, where we create module from the .so file:

tvm::runtime::Module mod_factory = tvm::runtime::Module::LoadFromFile("model.so");
tvm::runtime::Module gmod = mod_factory.GetFunction("default")(ctx);
tvm::runtime::PackedFunc set_input = gmod.GetFunction("set_input");
tvm::runtime::PackedFunc get_input = gmod.GetFunction("get_input");
tvm::runtime::PackedFunc get_output = gmod.GetFunction("get_output");
tvm::runtime::PackedFunc run = gmod.GetFunction("run");

// set inputs and outputs

size_t niterations = 5000;
for (size_t i = 0; i < niterations; i++) {
  run();
}

Then we collect memory usage info with Valgrind.

    MB
818.5^                                                                       #
     |                                                                    @@@#
     |                                                                @@@@@@@#
     |                                                            @@@@@@@@@@@#
     |                                                         @@@@@@@@@@@@@@#
     |                                                     ::::@@@@@@@@@@@@@@#
     |                                                  @:::: :@@@@@@@@@@@@@@#
     |                                             @@@:@@:::: :@@@@@@@@@@@@@@#
     |                                           @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                                      :@@@@@@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                                  :@@::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                               ::::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                           :@@@: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                       :::@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                   ::@@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |                @@@: @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     |           :::::@@ : @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     | @@:::::::@:: ::@@ : @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     | @ :::::::@:: ::@@ : @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
     | @ :::::::@:: ::@@ : @@: :@:@ @: ::@ ::@@@ @@@ @:@@:::: :@@@@@@@@@@@@@@#
   0 +----------------------------------------------------------------------->Gi
     0                                                                   75.95

We do not use any profiling info, but it is collected implicitly because of the compile-time switches:

  1. clCreateCommandQueue(this->context, did, CL_QUEUE_PROFILING_ENABLE, &err_code));
  2. OPENCL_CALL(clEnqueueNDRangeKernel(queue, kernel, work_dim, nullptr, wl.work_size,

With the proposed modifications this behavior is changed: clCommandQueue by default is created in the normal mode and is recreated with profiling capabilities when user calls profiler explicitly. When a profiling session is finished, the queue is recreated again in normal mode, which allows to mix profile() calls and run() calls.

With the proposed changes valgrind shows no abnormal memory usage for the example above.

    MB
148.9^#                                                                       
     |#::::::::::::::::::::::::::@:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
     |#: ::::::::::::::: ::::::: @:::::::::::::@:::@:::::@::::::@:::::@:::::@:
   0 +----------------------------------------------------------------------->Gi
     0                                                                   83.07

@argrento argrento changed the title Change of OpenCL profiling logic [OpenCL] Change of OpenCL profiling logic Apr 29, 2022
src/runtime/opencl/opencl_common.h Outdated Show resolved Hide resolved
@argrento argrento changed the title [OpenCL] Change of OpenCL profiling logic [WIP] [OpenCL] Change of OpenCL profiling logic Apr 30, 2022
src/runtime/opencl/opencl_common.h Outdated Show resolved Hide resolved
src/runtime/opencl/opencl_common.h Outdated Show resolved Hide resolved
src/runtime/opencl/opencl_common.h Outdated Show resolved Hide resolved
Copy link
Contributor

@echuraev echuraev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM. Thanks

@argrento argrento changed the title [WIP] [OpenCL] Change of OpenCL profiling logic [OpenCL] Change of OpenCL profiling logic May 9, 2022
@echuraev
Copy link
Contributor

@masahi could you please take a look at this PR?

@masahi masahi merged commit 0f6abea into apache:main May 10, 2022
@argrento argrento deleted the opencl_profiling branch May 12, 2022 10:08
@argrento argrento restored the opencl_profiling branch May 12, 2022 10:08
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request May 16, 2022
* Enable profiling only when it is used explicitly

* Change logic of clCommandQueue create/destroy

* Update comments

* Linter fix

* Refactor queue create

* Move queue recreation logic to function

* Replace profiling flag by the queue info request

* Enhance readability

* Fix linter errors
shtinsa pushed a commit to Deelvin/tvm that referenced this pull request May 17, 2022
* Enable profiling only when it is used explicitly

* Change logic of clCommandQueue create/destroy

* Update comments

* Linter fix

* Refactor queue create

* Move queue recreation logic to function

* Replace profiling flag by the queue info request

* Enhance readability

* Fix linter errors
shingjan pushed a commit to shingjan/tvm that referenced this pull request May 17, 2022
* Enable profiling only when it is used explicitly

* Change logic of clCommandQueue create/destroy

* Update comments

* Linter fix

* Refactor queue create

* Move queue recreation logic to function

* Replace profiling flag by the queue info request

* Enhance readability

* Fix linter errors
SebastianBoblest pushed a commit to SebastianBoblest/tvm that referenced this pull request May 27, 2022
* Enable profiling only when it is used explicitly

* Change logic of clCommandQueue create/destroy

* Update comments

* Linter fix

* Refactor queue create

* Move queue recreation logic to function

* Replace profiling flag by the queue info request

* Enhance readability

* Fix linter errors
@srkreddy1238
Copy link
Contributor

srkreddy1238 commented Sep 1, 2022

@argrento

This PR causes CLML profiling failure. Reason explained below

In general the workspaces can be accessed and shared via “device_api.opencl”. CLML integration shares the workspace created by default OpenCL and it has a reference to the command queue. Changing the command queue in between makes them invalid.

Why do we enable profiler in compilation when we don't want to profile any thing ?
Or
Is there any case where we enable profiling some thing else but not OpenCL / dynamically enable/disable profiling? In such cases we could think of having different compilation flag for OpenCL or environment varible to control at runtime launch.

At any case dynamically recreating the queue will cause issues for other components.

@masahi & @valmat07 comment pls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants